Approximate Nearest Neighbor Search for a Dataset of Normalized Vectors

نویسندگان

Kengo Terasawa

Yuzuru Tanaka

چکیده

This paper describes a novel algorithm for approximate nearest neighbor searching. For solving this problem especially in high dimensional spaces, one of the best-known algorithm is Locality-Sensitive Hashing (LSH). This paper presents a variant of the LSH algorithm that outperforms previously proposed methods when the dataset consists of vectors normalized to unit length, which is often the case in pattern recognition. The LSH scheme is based on a family of hash functions that preserves the locality of points. This paper points out that for our special case problem we can design efficient hash functions that map a point on the hypersphere into the closest vertex of the randomly rotated regular polytope. The computational analysis confirmed that the proposed method could improve the exponent ρ, the main indicator of the performance of the LSH algorithm. The practical experiments also supported the efficiency of our algorithm both in time and in space. key words: nearest neighbor, randomized algorithm, locality-sensitive hashing (LSH)

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Learning Better Encoding for Approximate Nearest Neighbor Search with Dictionary Annealing

We introduce a novel dictionary optimization method for high-dimensional vector quantization employed in approximate nearest neighbor (ANN) search. Vector quantization methods first seek a series of dictionaries, then approximate each vector by a sum of elements selected from these dictionaries. An optimal series of dictionaries should be mutually independent, and each dictionary should generat...

متن کامل

Subspace Approximation for Approximate Nearest Neighbor Search in NLP

Most natural language processing tasks can be formulated as the approximated nearest neighbor search problem, such as word analogy, document similarity, machine translation. Take the questionanswering task as an example, given a question as the query, the goal is to search its nearest neighbor in the training dataset as the answer. However, existing methods for approximate nearest neighbor sear...

متن کامل

Polysemous Codes

This paper considers the problem of approximate nearest neighbor search in the compressed domain. We introduce polysemous codes, which offer both the distance estimation quality of product quantization and the efficient comparison of binary codes with Hamming distance. Their design is inspired by algorithms introduced in the 90’s to construct channel-optimized vector quantizers. At search time,...

متن کامل

HDIdx: High-dimensional indexing for efficient approximate nearest neighbor search

Fast Nearest Neighbor (NN) search is a fundamental challenge in large-scale data processing and analytics, particularly for analyzing multimedia contents which are often of high dimensionality. Instead of using exact NN search, extensive research efforts have been focusing on approximate NN search algorithms. In this work, we present “HDIdx”, an efficient high-dimensional indexing library for f...

متن کامل

Improving Bilayer Product Quantization for Billion-Scale Approximate Nearest Neighbors in High Dimensions

The top-performing systems for billion-scale high-dimensional approximate nearest neighbor (ANN) search are all based on two-layer architectures that include an indexing structure and a compressed datapoints layer. An indexing structure is crucial as it allows to avoid exhaustive search, while the lossy data compression is needed to fit the dataset into RAM. Several of the most successful syste...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

IEICE Transactions

دوره 92-D شماره

صفحات -

تاریخ انتشار 2009

Approximate Nearest Neighbor Search for a Dataset of Normalized Vectors

نویسندگان

چکیده

منابع مشابه

Learning Better Encoding for Approximate Nearest Neighbor Search with Dictionary Annealing

Subspace Approximation for Approximate Nearest Neighbor Search in NLP

Polysemous Codes

HDIdx: High-dimensional indexing for efficient approximate nearest neighbor search

Improving Bilayer Product Quantization for Billion-Scale Approximate Nearest Neighbors in High Dimensions

عنوان ژورنال:

اشتراک گذاری